Picture for Yong Jae Lee

Yong Jae Lee

Agentic Very Long Video Understanding

Add code
Jan 26, 2026
Viaarxiv icon

VideoWeave: A Data-Centric Approach for Efficient Video Understanding

Add code
Jan 09, 2026
Viaarxiv icon

Group Diffusion: Enhancing Image Generation by Unlocking Cross-Sample Collaboration

Add code
Dec 11, 2025
Figure 1 for Group Diffusion: Enhancing Image Generation by Unlocking Cross-Sample Collaboration
Figure 2 for Group Diffusion: Enhancing Image Generation by Unlocking Cross-Sample Collaboration
Figure 3 for Group Diffusion: Enhancing Image Generation by Unlocking Cross-Sample Collaboration
Figure 4 for Group Diffusion: Enhancing Image Generation by Unlocking Cross-Sample Collaboration
Viaarxiv icon

Relational Visual Similarity

Add code
Dec 08, 2025
Figure 1 for Relational Visual Similarity
Figure 2 for Relational Visual Similarity
Figure 3 for Relational Visual Similarity
Figure 4 for Relational Visual Similarity
Viaarxiv icon

Can World Simulators Reason? Gen-ViRe: A Generative Visual Reasoning Benchmark

Add code
Nov 17, 2025
Figure 1 for Can World Simulators Reason? Gen-ViRe: A Generative Visual Reasoning Benchmark
Figure 2 for Can World Simulators Reason? Gen-ViRe: A Generative Visual Reasoning Benchmark
Figure 3 for Can World Simulators Reason? Gen-ViRe: A Generative Visual Reasoning Benchmark
Figure 4 for Can World Simulators Reason? Gen-ViRe: A Generative Visual Reasoning Benchmark
Viaarxiv icon

Contamination Detection for VLMs using Multi-Modal Semantic Perturbation

Add code
Nov 05, 2025
Viaarxiv icon

Real Deep Research for AI, Robotics and Beyond

Add code
Oct 23, 2025
Figure 1 for Real Deep Research for AI, Robotics and Beyond
Figure 2 for Real Deep Research for AI, Robotics and Beyond
Figure 3 for Real Deep Research for AI, Robotics and Beyond
Figure 4 for Real Deep Research for AI, Robotics and Beyond
Viaarxiv icon

CuRe: Cultural Gaps in the Long Tail of Text-to-Image Systems

Add code
Jun 09, 2025
Viaarxiv icon

UniTalk: Towards Universal Active Speaker Detection in Real World Scenarios

Add code
May 28, 2025
Viaarxiv icon

Decomposing Complex Visual Comprehension into Atomic Visual Skills for Vision Language Models

Add code
May 26, 2025
Viaarxiv icon